Bangla Part of Speech Tagging Using Contextual Embeddings and Oversampling Techniques
Koushik Roy, Md Hasan, K M Faizullah Fuhad, Nabeel Mohammed, A K M Shahariar Azad Rabby, Nazmul Hasan, Jebun Nahar, Fuad Rahman
Accepted to be presented at FTC 2020 - Future Technologies Conference 2020, 5-6 November 2020, Vancouver, Canada
Description
Part of Speech (PoS) Tagging has been a customary research
area in the field of Natural Language Processing. The popularization of
Neural Networks has opened substantially more scope of research for
Bangla PoS Tagging especially with the class of sequential models particularly using Recurrent Neural Networks like Long Short Term Memory
(LSTM) and Gated Recurrent Units (GRU). Our contribution in this paper is that we transformed the overall sequential modeling problem to
an inconsequent model using BERT embeddings to leverage the existing well understood oversampling algorithms for improving PoS Tagging
using a shallow feed-forward Neural Network. Our experiments result
indicate that Synthetic Minority Over-sampling Technique (SMOTE )
works well as an oversampling algorithm for BERT embeddings